首页> 外文OA文献 >SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder
【2h】

SMS Spam Filtering using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

机译:使用概率主题建模和堆叠的sms垃圾邮件过滤   去噪自动编码器

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In This paper we present a novel approach to spam filtering and demonstrateits applicability with respect to SMS messages. Our approach requires minimumfeatures engineering and a small set of la- belled data samples. Features areextracted using topic modelling based on latent Dirichlet allocation, and thena comprehensive data model is created using a Stacked Denoising Autoencoder(SDA). Topic modelling summarises the data providing ease of use and highinterpretability by visualising the topics using word clouds. Given that theSMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDAis able to model the messages and accurately discriminate between the twoclasses without the need for a pre-labelled training set. The results arecompared against the state-of-the-art spam detection algorithms with ourproposed approach achieving over 97% accuracy which compares favourably to thebest reported algorithms presented in the literature.
机译:在本文中,我们提出了一种新颖的垃圾邮件过滤方法,并展示了其在SMS消息方面的适用性。我们的方法需要最少的功能工程和少量的数据样本集。使用基于潜在Dirichlet分配的主题建模来提取特征,然后使用堆叠式降噪自动编码器(SDA)创建综合数据模型。主题建模通过使用​​词云可视化主题来汇总数据,以提供易用性和高可解释性。鉴于可以将SMS邮件视为垃圾邮件(不需要的)或火腿邮件(不需要的),因此SDA可以对邮件进行建模并准确区分这两个类别,而无需预先标记的培训集。结果与最先进的垃圾邮件检测算法进行了比较,我们提出的方法可达到97%以上的准确性,与文献中提出的最佳报道算法相比,它具有优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号